-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KV cache quantization types #114
Conversation
3486867
to
aa568e0
Compare
@@ -0,0 +1,14 @@ | |||
import { z } from "zod"; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we'll (probably?) have MLX types here too, let's drop Llama
from the file name. This is the pattern we use in LLMLoadModelConfig.ts
. Btw, shouldn't it just live in that file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted this file, but kept llama in the variable names since the MLX KV cache quantization implementation requires a different type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In variable name 👍
LLMLlamaCacheQuantizationType, | ||
llmLlamaCacheQuantizationTypes, | ||
llmLlamaCacheQuantizationTypeSchema, | ||
} from "./llm/LLMLlamaCacheQuantizationType"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} from "./llm/LLMLlamaCacheQuantizationType"; | |
} from "./llm/LLMLlamaCacheQuantizationType.js"; |
but probably should move to LLMLoadModelConfig
anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved these to LLMLoadModelConfig
f9009cf
to
6f1e8f5
Compare
6f1e8f5
to
13c4a57
Compare
Adds KV cache quantization configuration